vocal tract
Quantification of Tenseness in English and Japanese Tense-Lax Vowels: A Lagrangian Model with Indicator $\theta_1$ and Force of Tenseness Ftense(t)
The concept of vowel tenseness has traditionally been examined through the binary distinction of tense and lax vowels. However, no universally accepted quantitative definition of tenseness has been established in any language. Previous studies, including those by Jakobson, Fant, and Halle (1951) and Chomsky and Halle (1968), have explored the relationship between vowel tenseness and the vocal tract. Building on these foundations, Ishizaki (2019, 2022) proposed an indirect quantification of vowel tenseness using formant angles $\theta_1$ and $\theta_{F1}$ and their first and second derivatives, $d^Z_1(t)/dt = \lim \tan \theta_1(t$) and $d^2 Z_1(t)/dt^2 = d/dt \lim \tan \theta_1(t)$. This study extends this approach by investigating the potential role of a force-related parameter in determining vowel quality. Specifically, we introduce a simplified model based on the Lagrangian equation to describe the dynamic interaction of the tongue and jaw within the oral cavity during the articulation of close vowels. This model provides a theoretical framework for estimating the forces involved in vowel production across different languages, offering new insights into the physical mechanisms underlying vowel articulation. The findings suggest that this force-based perspective warrants further exploration as a key factor in phonetic and phonological studies.
Multimodal Segmentation for Vocal Tract Modeling
Jain, Rishi, Yu, Bohan, Wu, Peter, Prabhune, Tejas, Anumanchipalli, Gopala
Accurate modeling of the vocal tract is necessary to construct articulatory representations for interpretable speech processing and linguistics. However, vocal tract modeling is challenging because many internal articulators are occluded from external motion capture technologies. Real-time magnetic resonance imaging (RT-MRI) allows measuring precise movements of internal articulators during speech, but annotated datasets of MRI are limited in size due to time-consuming and computationally expensive labeling methods. We first present a deep labeling strategy for the RT-MRI video using a vision-only segmentation approach. We then introduce a multimodal algorithm using audio to improve segmentation of vocal articulators. Together, we set a new benchmark for vocal tract modeling in MRI video segmentation and use this to release labels for a 75-speaker RT-MRI dataset, increasing the amount of labeled public RT-MRI data of the vocal tract by over a factor of 9. The code and dataset labels can be found at \url{rishiraij.github.io/multimodal-mri-avatar/}.
Voice deepfakes are getting easier to spot
New research has shown that voice deepfakes are becoming easier to spot as synthetic recreations of real voices, thanks to the anatomy of our vocal tracts. Researchers at the University of Florida have devised a method of simulating images of a human vocal tract's apparent movements (opens in new tab) while a voice clip - real or fake - is played back. Professor of Computer and Information Science and Engineering Patrick Traynor and PhD student Logan Blue wrote that they and their colleagues found that simulations prompted by voice deepfakes weren't constrained by "the same anatomical limitations humans have", with some vocal tract measurements having "the same relative diameter and consistency as a drinking straw". Though scientists are starting to spot voice deepfakes with simulation and anatomical comparison, the risk of an ordinary person being tricked by any deepfake - which could lead to identity theft - remains a problem. Ordinary people don't yet have access to these tools.
Deepfake audio has a tell
An office worker answers it and hears his boss, in a panic, tell him that she forgot to transfer money to the new contractor before she left for the day and needs him to do it. She gives him the wire transfer information, and with the money transferred, the crisis has been averted. The worker sits back in his chair, takes a deep breath, and watches as his boss walks in the door. The voice on the other end of the call was not his boss. The voice he heard was that of an audio deepfake, a machine-generated audio sample designed to sound exactly like his boss.
Researchers reveal how they detect deepfake audio – here's how
An office worker answers it and hears his boss, in a panic, tell him that she forgot to transfer money to the new contractor before she left for the day and needs him to do it. She gives him the wire transfer information, and with the money transferred, the crisis has been averted. The worker sits back in his chair, takes a deep breath, and watches as his boss walks in the door. The voice on the other end of the call was not his boss. The voice he heard was that of an audio deepfake, a machine-generated audio sample designed to sound exactly like his boss.
Deepfake audio has a tell and researchers can spot it
An office worker answers it and hears his boss, in a panic, tell him that she forgot to transfer money to the new contractor before she left for the day and needs him to do it. She gives him the wire transfer information, and with the money transferred, the crisis has been averted. The worker sits back in his chair, takes a deep breath, and watches as his boss walks in the door. The voice on the other end of the call was not his boss. The voice he heard was that of an audio deepfake, a machine-generated audio sample designed to sound exactly like his boss.
Harbour seals can learn how to change their voices to seem bigger
Consider the squeak of a mouse and the low rumble of a lion's roar. In the animal kingdom, bigger animals usually produce lower pitch sounds as a result of their larger larynges and longer vocal tracts. But harbour seals seem to break that rule: they can learn how to change their calls. That means they can deliberately move between lower or higher pitch sounds and make themselves sound bigger than they really are. "The information that is in their calls is not necessarily honest," says Koen de Reus at the Max Planck Institute for Psycholinguistics in Nijmegen, Netherlands.
#ICML2021 invited talk round-up 2: randomized controlled trials, encoding speech, and molecular science
In this post, we summarise the final three invited talks from the International Conference on Machine Learning (ICML). These presentations covered: how machine learning can complement randomised controlled trials, encoding and decoding speech, and molecular science. Esther's work centres on the use of randomised controlled trials (RCT) and she runs policy experiments with the aim of understanding which policies work and which don't. Her work is particularly focussed on reducing poverty. Work of this type involves many causal questions, for which there are often many competing ideas. Such is the field that there is no real guidance for theory; experiments are needed to determine successful policies.
- Research Report > Strength High (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Neurology (0.98)
- Government (0.71)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.70)
Mind-reading device uses AI to turn brainwaves into audible speech
Electrodes on the brain have been used to translate brainwaves into words spoken by a computer – which could be useful in the future to help people who have lost the ability to speak. When you speak, your brain sends signals from the motor cortex to the muscles in your jaw, lips and larynx to coordinate their movement and produce a sound. "The brain translates the thoughts of what you want to say into movements of the vocal tract, and that's what we're trying to decode," says Edward Chang at the University of California San Francisco (UCSF). He and his colleagues created a two-step process to decode those thoughts using an array of electrodes surgically placed onto the part of the brain that controls movement, and a computer simulation of a vocal tract to reproduce the sounds of speech. In their study, they worked with five participants who had electrodes on the surface of their motor cortex as a part of their treatment for epilepsy.
- North America > United States > California > San Francisco County > San Francisco (0.56)
- North America > United States > Illinois (0.05)
Implant turns brain signals into synthesized speech
People with neurological conditions who lose the ability to speak can still send the brain signals used for speech (such as the lips, jaw and larynx), and UCSF researchers might just use that knowledge to bring voices back. They've crafted a brain machine interface that can turn those brain signals into mostly recognizable speech. Instead of trying to read thoughts, the machine learning technology picks up on individual nerve commands and translates those to a virtual vocal tract that approximates the intended output. Although the system accurately captures the distinctive sound of someone's voice and is frequently easy to understand, there are times when the synthesizer produces garbled words. It's still miles better than earlier approaches that didn't try to replicate the vocal tract, though.